Count-Min-Log sketch: Approximately counting with approximate counters

نویسندگان

  • Guillaume Pitel
  • Geoffroy Fouquier
چکیده

Count-Min Sketch [1] is a widely adopted algorithm for approximate event counting in large scale processing. However, the original version of the Count-Min-Sketch (CMS) suffers of some deficiences, especially if one is interested in the low-frequency items, such as in textmining related tasks. Several variants of CMS [5] have been proposed to compensate for the high relative error for low-frequency events, but the proposed solutions tend to correct the errors instead of preventing them. In this paper, we propose the Count-Min-Log sketch, which uses logarithm-based, approximate counters [7, 4] instead of linear counters to improve the average relative error of CMS at constant memory footprint.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Count-Min Tree Sketch: Approximate counting for NLP

The Count-Min Sketch [1] is a widely adopted structure for approximate event counting in large scale processing. In a previous work [7] we improved the original version of the Count-Min-Sketch (CMS) with conservative update using approximate counters [6, 4] instead of linear counters. These structures are computationaly efficient and improve the average relative error (ARE) of a CMS at constant...

متن کامل

Approximate Scalable Bounded Space Sketch for Large Data NLP

We exploit sketch techniques, especially the Count-Min sketch, a memory, and time efficient framework which approximates the frequency of a word pair in the corpus without explicitly storing the word pair itself. These methods use hashing to deal with massive amounts of streaming text. We apply CountMin sketch to approximate word pair counts and exhibit their effectiveness on three important NL...

متن کامل

Sketching Techniques for Large Scale NLP

In this paper, we address the challenges posed by large amounts of text data by exploiting the power of hashing in the context of streaming data. We explore sketch techniques, especially the CountMin Sketch, which approximates the frequency of a word pair in the corpus without explicitly storing the word pairs themselves. We use the idea of a conservative update with the Count-Min Sketch to red...

متن کامل

A Coin Tossing Algorithm for Counting Large Numbers of Events

"Approximate counters" are realized by probabilistic algorithms that maintain an approximate count in the interval 1 to n using only about 10921092 n bits. The algorithmic principle was proposed by R . M o r r i s [7] : Starting with counter C = 1, after n increments C should contain a good approximation to log e n . Thus C should be increased by 1 after other n increments approximately. Since ...

متن کامل

Approximate Counting: A Detailed Analysis

Approximate counting is an algorithm proposed by R. Morris which makes it possible to keep approximate counts of large numbers in small counters. The algorithm is useful for gathering statistics of a large number of events as well as for applications related to data compression (Todd et al.). We provide here a complete analysis of approximate counting which establishes good convergence properti...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1502.04885  شماره 

صفحات  -

تاریخ انتشار 2015